automatic translation
HieroGlyphTranslator: Automatic Recognition and Translation of Egyptian Hieroglyphs to English
Nasser, Ahmed, Mohamed, Marwan, Sherif, Alaa, Mahmoud, Basmala, Yehia, Shereen, Saad, Asmaa, El-Rahmany, Mariam S., Mohamed, Ensaf H.
Egyptian hieroglyphs, the ancient Egyptian writing system, are composed entirely of drawings. Translating these glyphs into English poses various challenges, including the fact that a single glyph can have multiple meanings. Deep learning translation applications are evolving rapidly, producing remarkable results that significantly impact our lives. In this research, we propose a method for the automatic recognition and translation of ancient Egyptian hieroglyphs from images to English. This study utilized two datasets for classification and translation: the Morris Franken dataset and the EgyptianTranslation dataset. Our approach is divided into three stages: segmentation (using Contour and Detectron2), mapping symbols to Gardiner codes, and translation (using the CNN model). The model achieved a BLEU score of 42.2, a significant result compared to previous research.
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- Europe > France (0.04)
- Africa > Middle East > Egypt > Giza Governorate > Giza (0.04)
- Africa > Middle East > Egypt > Cairo Governorate > Cairo (0.04)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
iLSU-T: an Open Dataset for Uruguayan Sign Language Translation
Stassi, Ariel E., Boria, Yanina, Di Martino, J. Matías, Randall, Gregory
Automatic sign language translation has gained particular interest in the computer vision and computational linguistics communities in recent years. Given each sign language country particularities, machine translation requires local data to develop new techniques and adapt existing ones. This work presents iLSU T, an open dataset of interpreted Uruguayan Sign Language RGB videos with audio and text transcriptions. This type of multimodal and curated data is paramount for developing novel approaches to understand or generate tools for sign language processing. iLSU T comprises more than 185 hours of interpreted sign language videos from public TV broadcasting. It covers diverse topics and includes the participation of 18 professional interpreters of sign language. A series of experiments using three state of the art translation algorithms is presented. The aim is to establish a baseline for this dataset and evaluate its usefulness and the proposed pipeline for data processing. The experiments highlight the need for more localized datasets for sign language translation and understanding, which are critical for developing novel tools to improve accessibility and inclusion of all individuals. Our data and code can be accessed.
- South America > Uruguay (0.05)
- South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
- North America > United States (0.04)
- (6 more...)
MiniF2F in Rocq: Automatic Translation Between Proof Assistants -- A Case Study
Viennot, Jules, Baudart, Guillaume, Arias, Emilio Jesùs Gallego, Lelarge, Marc
In this work, we conduct an experiment using state-of-the-art LLMs to translate MiniF2F into Rocq. The translation task focuses on generating a Rocq theorem based on three sources: a natural language description, the Lean formalization, and the Isabelle formalization. We conducted our experiment in 3 stages of increasing complexity, from basic one-shot prompting to multi-turn conversations that incorporate feedback from unsuccessful attempts. At each stage, we perform multiple rounds of translation using increasingly advanced models: GPT-4o mini, Claude 3.5 Sonnet, o1 mini, and o1. We successfully translated 478 out of 488 theorems. The dataset is opensource: https://github.com/LLM4Rocq/miniF2F-rocq.
Spanish and LLM Benchmarks: is MMLU Lost in Translation?
Plaza, Irene, Melero, Nina, del Pozo, Cristina, Conde, Javier, Reviriego, Pedro, Mayor-Rocher, Marina, Grandury, María
The evaluation of Large Language Models (LLMs) is a key element in their continuous improvement process and many benchmarks have been developed to assess the performance of LLMs in different tasks and topics. As LLMs become adopted worldwide, evaluating them in languages other than English is increasingly important. However, most LLM benchmarks are simply translated using an automated tool and then run in the target language. This means that the results depend not only on the LLM performance in that language but also on the quality of the translation. In this paper, we consider the case of the well-known Massive Multitask Language Understanding (MMLU) benchmark. Selected categories of the benchmark are translated into Spanish using Azure Translator and ChatGPT4 and run on ChatGPT4. Next, the results are processed to identify the test items that produce different answers in Spanish and English. Those are then analyzed manually to understand if the automatic translation caused the change. The results show that a significant fraction of the failing items can be attributed to mistakes in the translation of the benchmark. These results make a strong case for improving benchmarks in languages other than English by at least revising the translations of the items and preferably by adapting the tests to the target language by experts.
- North America > United States (0.14)
- Europe > Spain > Galicia > Madrid (0.05)
The End of Foreign-Language Education
A few days ago, I watched a video of myself talking in perfect Chinese. I've been studying the language on and off for only a few years, and I'm far from fluent. But there I was, pronouncing each character flawlessly in the correct tone, just as a native speaker would. Gone were my grammar mistakes and awkward pauses, replaced by a smooth and slightly alien-sounding voice. "My favorite food is sushi," I said--wo zui xihuan de shiwu shi shousi--with no hint of excitement or joy.
- Asia > China (0.06)
- Oceania > New Zealand (0.05)
- Oceania > Australia (0.05)
- (4 more...)
Gmail language translation finally appears on mobile
We've all taken automatic language translation for granted, especially where web pages are concerned. But Google has finally added it to where it might be the most useful: the mobile version of Gmail. To be fair, automatic translation has been a feature of Gmail for years, but only on the web. Now, Google is building it into its mobile app, where millions of people access their email every day. Android users will see automatic translation within their Gmail app via an update scheduled beginning today, August 8; iOS users will receive an update as early as August 21.
Airbnb upgrades app with automatic translations, verified WiFi and more as international travel picks up
Airbnb is making it easier for international travelers to book their stays just as the U.S. reopens its borders to foreign travelers. The short-term rental platform is introducing a translation engine that will automatically translate reviews and listing descriptions in over 60 languages. The feature is set to launch before the end of the year. "Translation Engine improves the quality of more than 99% of Airbnb listings," the company said in a Tuesday news release. "Translation Engine uses millions of Airbnb data points to improve translations, so it will get even smarter over time as it learns from new content that's submitted."
Zoom will have automatic translation in real time to videoconferences after buying the company Kites
Video calling platforms and apps have taken on an unprecedented role since the arrival of Covid-19. One of the most important and popular is Zoom, which will now add a new real-time machine translation feature, after announcing the purchase of communications company Kites . Through its official blog, Zoom announced that they are in negotiations to acquire the company Karlsruhe Information Technology Solutions, abbreviated Kites . It is a German startup "dedicated to the development of real-time machine translation solutions" or MT, for its acronym in English. Zoom said that the acquisition of Kites represents the possibility of eliminating the language gaps between its users.
Understanding Natural Language Processing Centric Digital
From its inception, the Internet has been a massive and constantly expanding conglomeration of unstructured data: articles, commentary, forums, secure networks, assets, and so on. Businesses, however, primarily operate on structured data. So the potential for providing actionable insights from Internet data must be derived through finding some methodology of analysis to structure it. When the Internet grew into a vast repository of potentially useful data, the concept of processing information as human communication was redefined. Natural language processing -- which began as "machine translation" in the 1950s and was focused on automatic translation between English and Russian -- truly came into its own in the context.
Deep Learning: New steps for Natural Language Processing
Natural Language Processing (NLP) of texts has been applied with different degrees of success. For example, automatic translation has attracted a lot of attention in the early stages of NLP. Nowadays, with the advent of social networks, users generate a big volume of interesting information for companies which are either in the search of user feedback for they products or in the search of personalised information to sell new ones. Thus, new NLP interesting applications appear such as sentiment analysis (extracting opinions in a user opinion about a product), user wants and needs detection or user profiling. Humans cannot process this information timely without great effort and money expenditures and computers stand up as the only alternative as they are much faster than humans.